7 research outputs found

    BiSon-e: a lightweight and high-performance accelerator for narrow integer linear algebra computing on the edge

    Get PDF
    Linear algebra computational kernels based on byte and sub-byte integer data formats are at the base of many classes of applications, ranging from Deep Learning to Pattern Matching. Porting the computation of these applications from cloud to edge and mobile devices would enable significant improvements in terms of security, safety, and energy efficiency. However, despite their low memory and energy demands, their intrinsically high computational intensity makes the execution of these workloads challenging on highly resource-constrained devices. In this paper, we present BiSon-e, a novel RISC-V based architecture that accelerates linear algebra kernels based on narrow integer computations on edge processors by performing Single Instruction Multiple Data (SIMD) operations on off-The-shelf scalar Functional Units (FUs). Our novel architecture is built upon the binary segmentation technique, which allows to significantly reduce the memory footprint and the arithmetic intensity of linear algebra kernels requiring narrow data sizes. We integrate BiSon-e into a complete System-on-Chip (SoC) based on RISC-V, synthesized and Place Routed in 65nm and 22nm technologies, introducing a negligible 0.07% area overhead with respect to the baseline architecture. Our experimental evaluation shows that, when computing the Convolution and Fully-Connected layers of the AlexNet and VGG-16 Convolutional Neural Networks (CNNs) with 8-, 4-, and 2-bit, our solution gains up to 5.6×, 13.9× and 24× in execution time compared to the scalar implementation of a single RISC-V core, and improves the energy efficiency of string matching tasks by 5× when compared to a RISC-V-based Vector Processing Unit (VPU).This research was supported by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of the total cost eligible, under the DRAC project [001-P-001723], and from the Spanish State Research Agency - Ministry of Science and Innovation (contract PID2019-107255GB). This research was also supported by the grant PRE2020-095272 funded by MCIN/AEI/ 10.13039/501100011033 and, by “ESF Investing in your future”.Peer ReviewedPostprint (author's final draft

    Adaptable register file organization for vector processors

    Get PDF
    Contemporary Vector Processors (VPs) are de-signed either for short vector lengths, e.g., Fujitsu A64FX with 512-bit ARM SVE vector support, or long vectors, e.g., NEC Aurora Tsubasa with 16Kbits Maximum Vector Length (MVL1). Unfortunately, both approaches have drawbacks. On the one hand, short vector length VP designs struggle to provide high efficiency for applications featuring long vectors with high Data Level Parallelism (DLP). On the other hand, long vector VP designs waste resources and underutilize the Vector Register File (VRF) when executing low DLP applications with short vector lengths. Therefore, those long vector VP implementations are limited to a specialized subset of applications, where relatively high DLP must be present to achieve excellent performance with high efficiency. Modern scientific applications are getting more diverse, and the vector lengths in those applications vary widely. To overcome these limitations, we propose an Adaptable Vector Architecture (AVA) that leads to having the best of both worlds. AVA is designed for short vectors (MVL=16 elements) and is thus area and energy-efficient. However, AVA has the functionality to reconfigure the MVL, thereby allowing to exploit the benefits of having a longer vector of up to 128 elements microarchitecture when abundant DLP is present. We model AVA on the gem5 simulator and evaluate AVA performance with six applications taken from the RiVEC Benchmark Suite. To obtain area and power consumption metrics, we model AVA on McPAT for 22nm technology. Our results show that by reconfiguring our small VRF (8KB) plus our novel issue queue scheme, AVA yields a 2X speedup over the default configuration for short vectors. Additionally, AVA shows competitive performance when compared to a long vector VP, while saving 50% of area.Research reported in this publication is partially supported by CONACyT Mexico under Grant No. 472106, the Spanish State Research Agency - Ministry of Science and Innovation (contract PID2019-107255GB), and the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of the total cost eligible, under the DRAC project [001-P-001723].Peer ReviewedPostprint (author's final draft

    Vitruvius+: An area-efficient RISC-V decoupled vector coprocessor for high performance computing applications

    Get PDF
    The maturity level of RISC-V and the availability of domain-specific instruction set extensions, like vector processing, make RISC-V a good candidate for supporting the integration of specialized hardware in processor cores for the High Performance Computing (HPC) application domain. In this article,1 we present Vitruvius+, the vector processing acceleration engine that represents the core of vector instruction execution in the HPC challenge that comes within the EuroHPC initiative. It implements the RISC-V vector extension (RVV) 0.7.1 and can be easily connected to a scalar core using the Open Vector Interface standard. Vitruvius+ natively supports long vectors: 256 double precision floating-point elements in a single vector register. It is composed of a set of identical vector pipelines (lanes), each containing a slice of the Vector Register File and functional units (one integer, one floating point). The vector instruction execution scheme is hybrid in-order/out-of-order and is supported by register renaming and arithmetic/memory instruction decoupling. On a stand-alone synthesis, Vitruvius+ reaches a maximum frequency of 1.4 GHz in typical conditions (TT/0.80V/25°C) using GlobalFoundries 22FDX FD-SOI. The silicon implementation has a total area of 1.3 mm2 and maximum estimated power of ~920 mW for one instance of Vitruvius+ equipped with eight vector lanes.This research has received funding from the European High Performance Computing Joint Undertaking (JU) under Framework Partnership Agreement No 800928 (European Processor Initiative) and Specific Grant Agreement No 101036168 (EPI SGA2). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and from Croatia, France, Germany, Greece, Italy, Netherlands, Portugal, Spain, Sweden, and Switzerland. The EPI-SGA2 project, PCI2022-132935 is also co-funded by MCIN/AEI/10.13039/501100011033 and by the UE NextGen- erationEU/PRTR. This work has also been partially supported by the Spanish Ministry of Science and Innovation (PID2019-107255GB-C21/AEI/10.13039/501100011033).Peer ReviewedPostprint (author's final draft

    DVINO: A RISC-V vector processor implemented in 65nm technology

    Get PDF
    This paper describes the design, verification, implementation and fabrication of the Drac Vector IN-Order (DVINO) processor, a RISC-V vector processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The DVINO processor includes an internally developed two-lane vector processor unit as well as a Phase Locked Loop (PLL) and an Analog-to-Digital Converter (ADC). The paper summarizes the design from architectural as well as logic synthesis and physical design in CMOS 65nm technology.The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN).Peer ReviewedArticle signat per 43 autors/es: Guillem Cabo∗, Gerard Candón∗, Xavier Carril∗, Max Doblas∗, Marc Domínguez∗, Alberto González∗, Cesar Hernández†, Víctor Jiménez∗, Vatistas Kostalampros∗, Rubén Langarita∗, Neiel Leyva†, Guillem López-Paradís∗, Jonnatan Mendoza∗, Francesco Minervini∗, Julian Pavón∗, Cristobal Ramírez∗, Narcís Rodas∗, Enrico Reggiani∗, Mario Rodríguez∗, Carlos Rojas∗, Abraham Ruiz∗, Víctor Soria∗, Alejandro Suanes‡, Iván Vargas∗, Roger Figueras∗, Pau Fontova∗, Joan Marimon∗, Víctor Montabes∗, Adrián Cristal∗, Carles Hernández∗, Ricardo Martínez‡, Miquel Moretó∗§, Francesc Moll∗§, Oscar Palomar∗§, Marco A. Ramírez†, Antonio Rubio§, Jordi Sacristán‡, Francesc Serra-Graells‡, Nehir Sonmez∗, Lluís Terés‡, Osman Unsal∗, Mateo Valero∗§, Luís Villa† // ∗Barcelona Supercomputing Center (BSC), Barcelona, Spain. Email: [email protected]; †Centro de Investigación en Computación, Instituto Politécnico Nacional (CIC-IPN), Mexico City, Mexico; ‡ Institut de Microelectronica de Barcelona, IMB-CNM (CSIC), Spain. Email: [email protected]; §Universitat Politecnica de Catalunya (UPC), Barcelona, Spain. Email: [email protected] (author's final draft

    Low-Power and Compact CMOS Circuit Design of Digital Pixel Sensors for X-Ray Imagers

    Get PDF
    La obtenció d’imatges utilitzant raigs-X ha esdevingut una tecnologia clau per a un ampli rang d’aplicacions tant industrials com mèdiques o científiques, doncs permet estudiar l’interior dels objectes sense necessitat de destruir-los o desmantellar-los. En aquest sentit, hi ha un creixent interès en la recerca en aquests camps, com demostra la literatura, per desenvolupar sistemes avançats de raig-X capaços d’obtenir imatges d’alta qualitat a la vegada que es redueix la dosi total de radiació. Actualment, els imagers de raig-X estan dominats per sistemes híbrids, basats en matrius de píxels en detectors de conversió directa de raig-X i els seus corresponents circuits integrats de lectura (ROICs). Tot i el seu elevat cost i les seves limitacions en àrea en comparació amb els clàssics sensors de conversió indirecta, els avantatges que ofereixen aquests sistemes són clars en quant a la reducció de la dosi de radiació necessària, la millora de la integritat del senyal i l’escalat en la resolució espacial. Pel que fa al mètode de lectura que empren els ROICs, l’estratègia més estesa es basa en el conteig de fotons, degut als avantatges en termes d’immunitat al soroll i de classificació dels fotons. No obstant, aquests sistemes d’imatge per raig-X pateixen de pèrdues d’informació degut a efectes com el charge-sharing i el pile-up. És en aquest context que l’objectiu d’aquest treball de tesi és proposar tècniques específiques de disseny analògic i mixte de circuits per al desenvolupament de píxels digitals sensors (DPS) compactes i de baix consum per a ROICs focalitzats a imagers de raig-X híbrids de conversió directa. L’arquitectura del píxel proposat, basada en el mètode de lectura per integració de càrrega, evita la pèrdua d’informació que pateixen els sistemes basats en el conteig de fotons i contribueix a la qualitat de les imatges per raig-X amb una àrea de píxel compacta i un baix consum per millorar la resolució de la imatge i reduir l’escalfament del detector, respectivament. En aquest sentit, el circuits CMOS del DPS proposat inclouen una conversió de la càrrega sense pèrdues a nivell de píxel per extendre el rang dinàmic, ajust individual del guany per compensar el FPN de la matriu de píxels, capacitat d’autopolarització i comunicacions exclusivament digitals per reduir el crosstalk entre píxels, capacitat d’auto-test per reducció de costos, selecció de la càrrega col·lectable per ampliar el rang d’aplicacions i cancel·lació del corrent d’obscuritat a nivell de píxel. A més, les tècniques de disseny proposades s’orienten al desenvolupament futur de sistemes d’imatge de raig-X modulars 2D amb grans àrees escalables i contínues de sensat. Aquesta recerca en disseny de circuits s’ha materialitzat en diverses generacions de demostradors DPS, amb valors de pitch des de 100μm baixant fins a 52μm, integrades utilitzant una tecnologia CMOS estàndard de 0.18μm i 1P6M. S’ha fet una anàlisi exhaustiva de les mesures tant elèctriques com amb raigs-X dels prototips de circuits proposats per a la seva validació. Els resultats experimentals, alineen aquest treball inclús més enllà de l’estat de l’art en píxels actius en termes de resolució espacial, consum, linealitat, SNR i flexibilitat del píxel. Aquest últim punt adequa les tècniques de disseny de circuits proposades a una àmplia gamma d’aplicacions d’imatges de raigs-X.X-ray imaging has become a key enabling technology for a wide range of industrial, medical and scientific applications since it allows studying the inside of objects without the need to destroy or dismantle them. In this sense there is a growing research interest in literature to develop advanced X-ray systems capable of obtaining high quality images while reducing the total radiation dose. Currently, X-ray imagers are dominated by hybrid systems, built from a pixel array of direct conversion X-ray detectors and its corresponding readout integrated circuit (ROIC). Despite their higher cost and limited area compared to classical indirect counterparts, the advantages of these systems are clear in terms of radiation dose reduction, signal integrity improvement and spatial resolution scaling. Concerning the readout method used by the ROICs, the most common design strategy is based on photon-counting, due to its advantages regarding circuit noise immunity and photon classification. However, these X-ray imaging systems tend to experience from information losses caused by charge-sharing and pile-up effects. In this context, the goal of the presented thesis work is to propose specific analog and mixed circuit techniques for the full-custom CMOS design of low-power and compact pitch digital pixel sensors (DPS) for ROICs targeting hybrid and direct conversion X-ray imagers. The proposed pixel architecture, based on the charge-integration readout method, avoids information losses experienced by photon-counting and contributes to X-ray image quality by a compact pixel area and low-power consumption to improve image resolution and reduce heating of X-ray detectors, respectively. In this sense, the proposed CMOS DPS circuits feature in-pixel A/D lossless charge conversion for extended dynamic range, individual gain tuning for pixel array FPN compensation, self-biasing capability and digital-only interface for inter-pixel crosstalk reduction, built-in test capability for costs reduction, selectable electron/hole collection to wide the applications range and in-pixel dark current cancellation. Furthermore, the proposed design techniques are oriented to the future development of truly 2D modular X-ray imager systems with large scale and seamless sensing areas. All the above circuit design research has been materialized in several generations of DPS demonstrators, with pitch values ranging from 100μm down to 52μm, all of them integrated using standard 0.18μm 1P6M CMOS technology. Extensive analysis of both electrical and X-ray measurements on the pixel circuit prototypes have been done to proof their validity. Experimental results align this work not only within but also beyond the state-of-the-art active pixels in terms of spatial resolution, power consumption, linearity, SNR and pixel flexibility. This last point makes the proposed pixel design techniques specially suitable for a wide range of X-ray image applications

    Low-Power and Compact CMOS Circuit Design of Digital Pixel Sensors for X-Ray Imagers

    Get PDF
    La obtenció d'imatges utilitzant raigs-X ha esdevingut una tecnologia clau per a un ampli rang d'aplicacions tant industrials com mèdiques o científiques, doncs permet estudiar l'interior dels objectes sense necessitat de destruir-los o desmantellar-los. En aquest sentit, hi ha un creixent interès en la recerca en aquests camps, com demostra la literatura, per desenvolupar sistemes avançats de raig-X capaços d'obtenir imatges d'alta qualitat a la vegada que es redueix la dosi total de radiació. Actualment, els imagers de raig-X estan dominats per sistemes híbrids, basats en matrius de píxels en detectors de conversió directa de raig-X i els seus corresponents circuits integrats de lectura (ROICs). Tot i el seu elevat cost i les seves limitacions en àrea en comparació amb els clàssics sensors de conversió indirecta, els avantatges que ofereixen aquests sistemes són clars en quant a la reducció de la dosi de radiació necessària, la millora de la integritat del senyal i l'escalat en la resolució espacial. Pel que fa al mètode de lectura que empren els ROICs, l'estratègia més estesa es basa en el conteig de fotons, degut als avantatges en termes d'immunitat al soroll i de classificació dels fotons. No obstant, aquests sistemes d'imatge per raig-X pateixen de pèrdues d'informació degut a efectes com el charge-sharing i el pile-up. És en aquest context que l'objectiu d'aquest treball de tesi és proposar tècniques específiques de disseny analògic i mixte de circuits per al desenvolupament de píxels digitals sensors (DPS) compactes i de baix consum per a ROICs focalitzats a imagers de raig-X híbrids de conversió directa. L'arquitectura del píxel proposat, basada en el mètode de lectura per integració de càrrega, evita la pèrdua d'informació que pateixen els sistemes basats en el conteig de fotons i contribueix a la qualitat de les imatges per raig-X amb una àrea de píxel compacta i un baix consum per millorar la resolució de la imatge i reduir l'escalfament del detector, respectivament. En aquest sentit, el circuits CMOS del DPS proposat inclouen una conversió de la càrrega sense pèrdues a nivell de píxel per extendre el rang dinàmic, ajust individual del guany per compensar el FPN de la matriu de píxels, capacitat d'autopolarització i comunicacions exclusivament digitals per reduir el crosstalk entre píxels, capacitat d'auto-test per reducció de costos, selecció de la càrrega col·lectable per ampliar el rang d'aplicacions i cancel·lació del corrent d'obscuritat a nivell de píxel. A més, les tècniques de disseny proposades s'orienten al desenvolupament futur de sistemes d'imatge de raig-X modulars 2D amb grans àrees escalables i contínues de sensat. Aquesta recerca en disseny de circuits s'ha materialitzat en diverses generacions de demostradors DPS, amb valors de pitch des de 100μm baixant fins a 52μm, integrades utilitzant una tecnologia CMOS estàndard de 0.18μm i 1P6M. S'ha fet una anàlisi exhaustiva de les mesures tant elèctriques com amb raigs-X dels prototips de circuits proposats per a la seva validació. Els resultats experimentals, alineen aquest treball inclús més enllà de l'estat de l'art en píxels actius en termes de resolució espacial, consum, linealitat, SNR i flexibilitat del píxel. Aquest últim punt adequa les tècniques de disseny de circuits proposades a una àmplia gamma d'aplicacions d'imatges de raigs-X.X-ray imaging has become a key enabling technology for a wide range of industrial, medical and scientific applications since it allows studying the inside of objects without the need to destroy or dismantle them. In this sense there is a growing research interest in literature to develop advanced X-ray systems capable of obtaining high quality images while reducing the total radiation dose. Currently, X-ray imagers are dominated by hybrid systems, built from a pixel array of direct conversion X-ray detectors and its corresponding readout integrated circuit (ROIC). Despite their higher cost and limited area compared to classical indirect counterparts, the advantages of these systems are clear in terms of radiation dose reduction, signal integrity improvement and spatial resolution scaling. Concerning the readout method used by the ROICs, the most common design strategy is based on photon-counting, due to its advantages regarding circuit noise immunity and photon classification. However, these X-ray imaging systems tend to experience from information losses caused by charge-sharing and pile-up effects. In this context, the goal of the presented thesis work is to propose specific analog and mixed circuit techniques for the full-custom CMOS design of low-power and compact pitch digital pixel sensors (DPS) for ROICs targeting hybrid and direct conversion X-ray imagers. The proposed pixel architecture, based on the charge-integration readout method, avoids information losses experienced by photon-counting and contributes to X-ray image quality by a compact pixel area and low-power consumption to improve image resolution and reduce heating of X-ray detectors, respectively. In this sense, the proposed CMOS DPS circuits feature in-pixel A/D lossless charge conversion for extended dynamic range, individual gain tuning for pixel array FPN compensation, self-biasing capability and digital-only interface for inter-pixel crosstalk reduction, built-in test capability for costs reduction, selectable electron/hole collection to wide the applications range and in-pixel dark current cancellation. Furthermore, the proposed design techniques are oriented to the future development of truly 2D modular X-ray imager systems with large scale and seamless sensing areas. All the above circuit design research has been materialized in several generations of DPS demonstrators, with pitch values ranging from 100μm down to 52μm, all of them integrated using standard 0.18μm 1P6M CMOS technology. Extensive analysis of both electrical and X-ray measurements on the pixel circuit prototypes have been done to proof their validity. Experimental results align this work not only within but also beyond the state-of-the-art active pixels in terms of spatial resolution, power consumption, linearity, SNR and pixel flexibility. This last point makes the proposed pixel design techniques specially suitable for a wide range of X-ray image applications

    An academic RISC-V silicon implementation based on open-source components

    Get PDF
    ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The design presented in this paper, called preDRAC, is a RISC-V general purpose processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The preDRAC processor is the first RISC-V processor designed and fabricated by a Spanish or Mexican academic institution, and will be the basis of future RISC-V designs jointly developed by these institutions. This paper summarizes the design tasks, for FPGA first and for SoC later, from high architectural level descriptions down to RTL and then going through logic synthesis and physical design to get the layout ready for its final tapeout in CMOS 65nm technology.The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) ´ from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN).Peer ReviewedPostprint (author's final draft
    corecore